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ABSTRACT 

A specific application of a general paradigm 
described by R. D. Cook (1986) and R. McCulloch (1985) in assessing 
local influence is given. Snow geese flock size is estimated as "K** 
by an observer and "7" by a photograph. "7" is believed to be the 
true flock size. The problem is to obtain true flock size "Z" for 
flocks not photographed but with a size estimated as "W" by the same 
observer. Predictive distribution of fxock sizes is discussed. Four 
sample graphs or plots of data are presented. (SLD) 
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L OCAL PREDICTIVE INFLUENCE 
by 

Michael Lavine 
University of Minnesota 

0. Introduction 

This paper gives a specific application of a general paradigm that 
was described by Cook (1986), and McCulloch (1985). Let M represent the 
ingredients of a statistical problem, M - (model, data) where the model 
consists of a set of sampling distributions and, for Bayesians, a set of 
prior distributions on the sampling distributions. An analysis 
technique T maps each M into an answer: T(M) - a where a might be a 
parameter estimate, a confidence interval, a probability or any other 
type of inference. 

Let M be a function of a vector w where is a standard and other 
values of w represent perturbations of the standard. For example, in a 
regression setting, w may be an n-vector of case weights, an n-vector of 
perturbations in the observations, or an nxp matrix of perturbations in 
the covariates. For these examples, would be the vector of all I's, 
the 0 vector, and the 0 matrix. 

Let D be a discrepancy function between pairs of answers, where 
D(a^,a2) ^ ^' ^® function D measures the influence that a perturbation 
scheme has on the outcome of the analysis. Cook (1986) suggests that we 
often want to examine the function 



h(w) - D|^T(M(Uq)), T(M(u))J for u's 

in a neighborhood around w^. 

Many useful choices for D will satisfy D(a^, a^) > 0 and D(a, a) - 
0. Assume, from now on, that these conditions are met and therefore 
that h has a local minimum at u - Uq. The shape of h at is an 
indicator of how drastically the inference changes as a function of 
at least locally. 

When h is twice dif ferentiable the shape of h at can be studied 

through the curvature, which in turn can be studied through the 

curvature in one direction at a time. Any vector w can be written as t 

- r«d where r is a scalar and d is a unit vector. The curvature C in 

d 

the direction d is defined to be 



C - ^^h(^) I 
^ dr^ 'r«0 

If the maximum curvature, sup C^, is large then small changes in t*; 

d 

can make large changes in the inference. On the other hand, a small 
maximum curvature is evidence that the analysis is robust to small 
changes in M. 

The remaining sections of this paper show to to compute c and 

d 

sup for one particular type of analysis, perturbation scheme and 
discrepancy function. 



2 
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1 . 2 Framework 



Let the data consist of independent random variables Y, , . . . Y and 
p-dimensional covariates X^^, ' " • ^n' Assume that the normal linear 
model with different case weights applies, i.e.. 

^ 2 
Y. - N(X fi, cp 

i 

Let X be the matrix (X^, X2, .... X^)^ so the model can be written 

Y * N(X^/8. a^S) 

where is the pxl vector of regression coefficients, is a positive 

scalar, and S is a positive-definite diagonal matrix. A standard 

assumption is that all the case weights are equal. Let u - (u^. 

w^)^ be a vector representing changes from identical case weights, so 

that the diagonal of S is :i/(l-K> ), l/(l+t^ )). The 0 vector is 

J- n 

Let the prior be the usual improper, non- informative prior 
-2 2 

proportional to a dfida , and suppose that the goal of the analysis is 
to compute a predictive density for a future random variable Z at known 
covariate w that satisfies 

Z - li(w^fi, a^). 

3 
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The Xullback-Leibler directed divergence betwen two densities f and g Is 

defined to be I(f,g) - /i:n(f(x)/g(x))f (x)dx. Let the discrepancy 

function D be the Kullback-Leibler divergence, so that h(w) - I(f . f ) 

where f is the predictive density computed with equal weights and f is 

w 

the predictive density computed with weights (1-K;.). 

By the linear transformation X*-S^/^X and Y*-S^/^Y we get the new 
, * *t 2 

model Y -N(X /3,a I) that has the same weight for every case. 

he distribution of Z given w, X and Y is the Student distribution 
t^ 2 

St(n-p,w i5,(l+v)s ) where p is the dimension of p, i9-(X*^X*)''^X*V\ 
vV(X*V)-^, s2-Y*V*/(n.p), Q-I-X*(X*V)-V^ is the orthogonal 
projection operator paralL^l to the column space of X* and the 
distribution St(a,b,c) has density proportional to 

dz[l+(z-b)^/ac]'^^'*"^^/^ (Geisser (1965), Johnson and Geisser (1982)). 

By interchanging integration and differentiation and after some 
tedious calculus we see that 



is d^(Ml + M2 + M3 + M4)d 

where Ml, M2, M3, and M4 are each rank one matrices. They are defined 
in terms of z^-(z^, . . . ,z^)-w^(X^X)* V and the vector of residuals 
QYT-(r^, . . . ,r^)^. The four matrices are 

Ml - (n-p)/(2(n.p+3)(l+v)^) • [z'z] [z'z]^ 

M2 - -(n.p)/((n.p+3)(l+v)Y^QY) • [z*z] [r^r]^ 

M3 - (n-p)/(2(n-p+3)(YV)^) • [r*r] [r*r]^ 

M4 - (n-p)(n.p+l)/((r..p+3)(l+v)(Y^QY)) • [r'z] [r'z]^ 
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where • denotes elementwise multiplication. Section 3 sketches a proof 
of this result. 

The direction that maximizes the second derivative is the 
eigenvector corresponding to the largest eigenvalue of Ml + M2 M3 + 
M4. Since each summand has rank 1 the sum has at most rank ^. Thus 
there is only a four dimensional spc'ce of weight changes that effect the 
Kullback-Leibler divergence of the predictive density, at least locally. 
2 . Examp le 

For a numerical example consider, as does Cook (1986>, the Snow 
Geese data for observer 1 from Weisberg (1985). The data are X-flock 
size estimated by the observer and Y-flock size determined from a 
photograph. We believe Y to be the true flock size. We are interested 
in true flock size Z for flocks which have not been photographed but 
whose sizes have been estimated as w by the same observer. Figure 1 is 
a scatterplot of the data. 

This is a calibration problem. Aitchison and Dunsmore (1975) show 
that if 

1) the conditional distribution of given Y^, ^ and is 
N(Vi5^Y^,a2), 

2) the conditional distribution of w given Z, and is 
N(i5Q+^^Z,a^), 

3) the conditional distribution of Z given Y is 
St(n-3,Y,(l+l/n)S(Y^.Y)V(n-3)) and 

4) the prior for and is proportional to o'^dfida^ 

then tne predictive distribution for Z gfven X, Y and w is St(n-2,a,b) 

5 



8 



where 



Y + (Z-X).S(X^-X)(Y^-Y) 

* ~ 3-5 and 

S(X^-X)^ 

RSS.S(X.-X)^ , -2 

b ( 1 . i . JSiil^ , 



and 



RSS is the residual sum of squares from the regression of Y on X. 
Geisser (1985) points out that the Aitchison and Dunsmore result is 
identical to the predictive distribution for Z given X. Y r.nd w if 
1') the conditional distribution of Y^ given X^. and is 
N()9Q+)9^X^,a^), 

2') the conditional distribution of Z given w, p and is 
N()9Q+)9^w,a^) and 

4') the prior for and is proportional to a'^dfida^. 

Therefore we can solve the calibration problem as a straightfoi ward 
linear regression prediction problem by reversing the roles of X and Y. 

Let's consider predicting true flock size for three values of 
estimated flock size, say w€{30, 100,300) . For each value of w we can 
^^^^ ^max* direction that maximizes C^. Figure 8.2 is a plot of the 
coordinates of d^^^ for each value of w as a function of observer count. 
Each coordinate of d^^^ corresponds to one data case. A large 
coordinate indicates a case that would cause a large change in the 
predictive distribution if its weight were changed slightly. 

Theje plots are similar to a plot by Cook of the coordinates of d 

max 
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as a function of observer count. Cook traated a as known and used a 

discrepancy function that depends only on point estimates of ^, The 

main difference between his plot and our plots is in the value for the 

point wher^ X-500. In Cook's analysis that point corresponded to the 

largest coordinate of d and would have been the most influential 

max 

under a set of small weight changer. In our analysis the influence of 
that point depends on the value of the covariate. 

Another interesting feature is that for w-30 the biggest change in 
the discrepancy function comes when the points at X-500 and X-250 get 
weight changes of the same sign. For w-300 the biggest change comes 
when those points get weight changes of opposite signs. This effect may 
arise because for w-3C0 changing the weights with opposite signs will 
make a large change in the location of the predictive distribution. For 
w*30 changing the weights with the same signs will make a large change 
in the variance of the predictive distribution. 
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APPENDIX B 



3. Computation of Curvature 

This appendix gives a rough outline and a Tew intermpdiate 
calculations for proving the result in Section 1. Let r be a scalar and 
d-(dj^, . . . ,d^)^ be a unit vector. Define 



S - 



1 + r^d. 



1 + r«d 



Under the linear model Y-N(xV.a^vS"^)) with prior a"%da^ the 

predictive aistribution for a future observable Z with known covariate w 

t'' ? 
is Lt(n.p, w (l+v)s'') where 

X is nxp 

X* - S^/2x 

Y* - S^/2y 

}- (X*V)-W 

t. *t * -1 
V - w (X X ) w 

- Y*V*/(ii-p) 
and Q - I - X*(X*V)"V 

Let be the predictive distribution of Z given above. We want to 
compute 



a I (f,,f„) 



r-O 
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where I is defined in Section 1. 
Let A - (l+v)s^, A. - A| 



The first step in computing is to differentiate and evaluate at r-0 
inside the integral. The derivatives of terms involving only A^ and 
are 0 because Aq and Bq do not depend on r. Terms involving only A can 
come outside of the integral. Letting ' denote differentiation with 
respect tj r we get 



n-p AA' ' - (A' )^ 



r-0 



n-p+1 



((n-p)A+B) ((n-p)A"+B) - ((n-p)A'+B)^ 
((n-p)A+B)^ 



r-0 



Note that 



^Q(.2)dz (n-p+2)(n-p) g(z)dz 

((n-p)A+B)^ (n-p+3)(n-p+l)(l+VQ)^(Y''QQY) 

where g is the Student (n-p+4, w^fi^, (n-p)AQ/(n-p+4) ) density and a 
subscript 0 indicates evaluation at r-0. Multiplying out the numerator 
of the integrand gives 
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r-0 



(n-p+2)(n-p) 



2(n-p+3)(l+VQ)^Y''QQY) 



[ (n-p) AA" + (n-p)jB"g(z)dz 



+ (n-p)A"jBg(z)dz + jBB"g(z)dz 

- (n-p)^A')^ - 

- J(B')^g(z)dz ] 



(n-p)^(A')^ - 2(n-p)A'jB'g(z;dz 



r-0. 

Next evaluate B and its derivatives. 
jBg(z)dz|^_Q - varrg) - (l+VQ)Y''QQY/(n-p+2) . 

jB'g(z)dz - 0 because the integral is an odd central moment of a 
synunetric density. 

B' ' does not involve z and comes outside of the integral. Using 

((X'^SX) "■'■)' - -(x''sX)"-'-(X^SX)'(x''sX)"-'- ( Rogers (1980)) and 

(x'^SX)' - x'^DX where D - diag(d, d ) vields 

i n 

^"|r-0 " 2(w''(x''x)"4 QqY)^ 



J o')2g(z)dz|^_Q 



^(w''^')^! _Q . var(g) 



- 4(w''(x''x)"VdQqY)2 (l+VQ)Y^QQY/(n-p+2) 
and henu3 

■ ^^')%-0 * ("-P)V(2(n-p+3)(l+VQ)2(y''QQY)^) 



+ (w''(x''x)"VdQqY)^ . (n-p+l)(n-p)/((n-p+3)(l+VQ)(Y''QQY)) 



Evaluating A' at r-0 and substituting back into C yields C as the 



sum 
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of four terms. 



C. - J • (w'^(x'^X)" VdX(x'^X)"-'-w)^ 

2(n-p+3)(l+VQ)^ 

n-p 



. (w (X X) X DX(X"X) w) (Y'QqDQqY) 



(n-p+3)(l+VQ)Y"QQY 

(n-p+1) (n-p) t u -1 t 2 

+ ; • (w''(x'-X) VdQ-Y)^ 

(n-p+3)(l+VQ)(Y''QQY) " 

Let e - QqY, the vector cf residuals. 

Let m - X(x'^X)"-'-w. 

Let ° denote elementwise multiplication. Then 



C - d*^ ( Ml + M2 + M3 + M4 ) d where 



n-p 

Ml - X • ( m - m ) ( m - m ) 

2(n-p+3)(l+VQ)^ 

-( n-p ) 

M2 - . (m°m)(e°e)'^ 

(n-p+3)(l+VQ)Y''QQY 

n-p 

M3 . ( e " e ) ( e " e 

2(n-p+3)(Y''QQY)^ 

(n-p+1) (n-p) 

M4 - . (e°m) (e-m)*^ 

(n-p+3)(l+v )(y''q Y) 
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